REMARKS 

Claims 1-36 were pending in this application. In a Final Office Action dated August 24 th , 
2007, claims 1-36 were rejected. 

Applicants are amending claims 1,6, 12, 13, 24 and 36 in this Amendment and Response. 
These amendments have been made to clarify the claimed subject matter and their entry is 
respectfully requested. 

In view of the Amendments herein and the Remarks that follow, Applicants respectfully 
request that the Examiner reconsider all outstanding objections and rejections, and withdraw 
them. 

Summary of Interview 

Applicant's representative thanks Examiner Paras Shah and Supervisory Examiner 
Patrick Edouard for their time in conducting an interview on October 4 th , 2007. The relevant 
portions of this discussion are summarized herein in accordance with MPEP §713.04. 

During the interview, Applicant's representative and the Examiners discussed the 
rejection of claims 1, 3-5, 13 and 15-23 under 35 USC 1 12. Through this discussion, consensus 
was reached that the amendment of claim language which recites "configured to" to language 
which recites "for" would resolve any indefiniteness in the base claims. 

Applicant's representative and the Examiners further discussed the rejection of 
independent claims 1, 6, 12, 13, 24 and 36 under 35 USC 103(a). During this discussion, 
argument was put forth that the combination of references failed to teach each and every element 
of the claimed invention. Specifically, substantial explanation was given as to why Su failed to 
disclose an iterative method. These arguments and explanations are summarized below. Based on 
these arguments, consensus was reached that the combination of references did not disclose the 
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claimed subject matter. The Applicant's representative also agreed to make amendments to the 
claims to clarify the claimed subject matter and facilitate prosecution. 

Response to Rejection Under 35 USC § 112, Paragraph 2 

In the 7 th , 8 th and 9 th paragraphs of the Office Action, the Examiner rejects claims 1, 3-5, 
13 and 15-23 under 35 USC 1 12, Paragraph 2 as allegedly failing to point out and distinctly 
claim the subject matter which Applicants regard as invention. Specifically, the Examiner asserts 
that the phrase "configured to" renders the claims indefinite since it suggests optional language. 

As suggested by the Examiner, independent claims 1 and 13 have been amended to recite 
"for" in place of "configured to". Based on these amendments, Applicants submit that 
independent claims 1 and 13 and dependent claims 15-23 point out and distinctly claim the 
subject matter which Applicants regard as the invention. 

Response to Rejection Under 35 USC 103(a) 

In the 10 th and 1 1 th paragraphs of the Office Action, the Examiner rejects claims 1,3,6, 
8,11-13, 20-24 and 31-36 under 35 USC 103(a) as allegedly being unpatentable over Su et al. 
(In Proceedings of the 32 nd Annual Meeting of the Association for Computation Linguistics, 
1994) in view of Jurafsky et al. (Speech and Language Processing: An Introduction to Natural 
Language Processing). This rejection is respectfully traversed. 

The claimed invention is directed systems, methods and apparatus which use a 
vocabulary comprising tokens to iteratively identify compounds having a plurality of lengths 
within the text corpus. At each iteration, a set of «-grams having a same length is identified and 
n-grams are added to the vocabulary. The vocabulary is rebuilt at each iteration based on the 
added n-grams. 
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To facilitate prosecution, the independent claims have been amended to clarify that the 
claimed limitations are directed to an iterative method wherein elements such as the length of n- 
grams and the vocabulary are modified at each iteration. Specifically, independent claims 1, 6, 2, 
13, 24 and 36 have been amended to recite elements similar to: 

iteratively identifying compounds having a plurality of lengths within the text corpus and 

rebuilding the vocabulary based on the identified compounds having the plurality of 
lengths, each compound comprising a plurality of tokens 

Su does not disclose these features. The system in Su is modeled as a two-class classification 

problem wherein the classifier is trained on features such as mutual information calculated from a 

training corpus labeled using a fixed sizes of n-grams. Su discloses a comparison of results obtained 

from using a classifier trained using bigrams and tri grams. 

Specifically, Su does not disclose "iteratively identifying compounds having a plurality of 
lengths within the text corpus". In his analysis and during the telephone interview, the Examiner 
supported the rejection of this element by citing to a portion of Su which discloses building bigram and 
trigram classifiers. The bigram and trigram classifiers in Su are compared to evaluate the accuracy (see 
Abstract) and the distribution statistics (See Tables 1 and 2) of the method outlined in Su. In Su, the 
bigram and trigram classifiers are built independently, producing separate classifiers which either 
identify bigrams or trigrams. Therefore, Su teaches away from the claimed invention of "identifying 
compounds having a plurality of lengths within the text corpus" as Su is limited to the identification of 
compounds of only one length per classification model. There is nothing in Su to suggest or even hint 
at iteratively combining the 2-gram and 3 -gram classifiers for "iteratively identifying compounds having 
a plurality of lengths within the text corpus". 
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Su further fails to provide a construction which supports both "a vocabulary comprising tokens 
extracted from a text corpus" and "rebuilding the vocabulary based on the identified compounds having 
a plurality of lengths". In his rejection, the Examiner cites a portion of Su which discloses the generation 
of a training corpus using to construct a classifier. As discussed above, Su discloses creating either a 
bigram classifier or a trigram classifier and therefore does not teach rebuilding the training corpus 
"based on the identified compounds having a plurality of lengths". 

Jurafsky does not remedy the deficiencies of Su. Juraksy is a textbook which outlines standard 
techniques in speech and language processing. Jurafsky merely discloses "backoff, an interpolation 
technique used to calculate likelihood of n-grams based on lesser order n-grams. Jurafsky does not teach 
or suggest compound identification or a "a vocabulary comprising tokens extracted from a text corpus". 
Accordingly, Jurafsky fails to disclose or suggest "iteratively identifying compounds having a plurality 
of lengths within the text corpus" and "rebuilding the vocabulary based on the identified compounds 
having a plurality of lengths". 

Based on at least the above, Applicant's submit that independent claims 1, 6, 12, 13, 24 and 36 
are patentably distinguishable over Su and Jurafsky, alone or in any combination. Claims 3-6, 8-1 1, 14, 
16-23, 25, 27-35 depend from claims 1, 6, 12, 13, 24 and 36. Additionally, claims 3-6, 8-11, 14, 16-23, 
25, 27-35 recite features not disclosed by the cited art. Thus, Applicants submit that claims 3-6, 8-11, 
14, 16-23, 25, 27-35 are patentably distinguishable over the cited art. 

On the basis of the above, Applicants respectfully submit that the pending claims are patentable 
over the cited art. The early allowance of all claims herein is requested. If the Examiner believes that 
direct contact with the Applicants' attorney will advance the prosecution of this case, the Examiner is 
encouraged to contact the undersigned as indicated below. 
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Respectfully Submitted, 
Franz, et al. 



Date: October 24, 2007 By: /Brian Hoffman/ 

Brian M. Hoffman, Reg. No. 39,713 
Attorney for Applicant 
Fenwick & West LLP 
801 California Street 
Mountain View, CA 94041 
Tel.: (415)875-2484 
Fax: (415)281-1350 
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